-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serialized parsers #1815
serialized parsers #1815
Conversation
jurgenvinju
commented
Jun 20, 2023
- removed public from all functions in ParseTree
- added interface for saving and loading parsers from disk without exposing the file internals (could be class files or grammar values, etc.)
…sing the file internals (could be class files or grammar values, etc.)
Codecov Report
@@ Coverage Diff @@
## main #1815 +/- ##
========================================
Coverage 48% 48%
- Complexity 6031 6056 +25
========================================
Files 670 670
Lines 58541 58660 +119
Branches 8536 8546 +10
========================================
+ Hits 28570 28673 +103
- Misses 27808 27815 +7
- Partials 2163 2172 +9
|
…ot about the nested classes
… the main class in the manifest
…ted parser consists of many subclasses each stored in a separate .class file
…pping reasons" This reverts commit 6dadd63.
@@ -125,7 +125,7 @@ | |||
<artifactId>rascal-maven-plugin</artifactId> | |||
<version>${rascal-maven.version}</version> | |||
<configuration> | |||
<errorsAsWarnings>false</errorsAsWarnings> | |||
<errorsAsWarnings>true</errorsAsWarnings> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is necessary due to double presence of rascal.jar during the tutor run. For some reason the old builtins are linked against the new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have to fix this in the tutor maven-plugin somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and errors as warnings fixes this? interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well yes, the tutor fails then on the ParseTree module, but the build isn't stopped by this because those errors are reported as warnings only. When I do a full bootstrap, the error goes away again, but I'd rather not have te tutor's own run-time interfere with the run-time of the examples it is running.. very difficult in the case of rascal.jar. I believe it will require JVM module layers to fix this.
@@ -310,7 +310,7 @@ data Symbol | |||
data Symbol // <19> | |||
= \conditional(Symbol symbol, set[Condition] conditions); | |||
|
|||
public bool subtype(Symbol::\sort(_), Symbol::\adt("Tree", _)) = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseTree is one of the most visible modules in the library; by using public
everywhere it is priming our users to do the same, while we decided that public
was the default. Over the course of the next months, I'll be removing those modifiers from the library modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, that's why this sneaked in ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, a guilty pleasure.
} catch (ClassCastException e) { | ||
throw new ImplementationError("parser generator:" + e.getMessage(), e); | ||
} catch (Throw e) { | ||
throw new ImplementationError("parser generator: " + e.getMessage() + e.getTrace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can only test this by having a serious bug in the parser generator. This should never happen.
rascalValues.storeParsers(start, saveLocation); | ||
} | ||
catch (IOException e) { | ||
throw RuntimeExceptionFactory.io(e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be tested by writing to a non-existing folder or something. however we'd not be testing the actual code that we wrote here, just the URIResolverRegistry.
{ | ||
return R<0> + R<1>; | ||
return R.from + R.to; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an accident; I was testing the type-checker for another diagnose run.
undo debug fix
more undo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleaned up some unrelated changes. otherwise I tested it on several smaller and larger grammars. The speed-up of using a saved parser against loading the generator and generating one is around 1000 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, nice first step! I think you want to build the concrete syntax on top of this in a new branch?
@@ -125,7 +125,7 @@ | |||
<artifactId>rascal-maven-plugin</artifactId> | |||
<version>${rascal-maven.version}</version> | |||
<configuration> | |||
<errorsAsWarnings>false</errorsAsWarnings> | |||
<errorsAsWarnings>true</errorsAsWarnings> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and errors as warnings fixes this? interesting
compile(fileMap, diagnostics); | ||
|
||
// side-effect alert: | ||
// now the local classloader contains the .class file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a new side-effect? or was this always a side-effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was always there, and necessary for the generated sub-classes bytecode to become available when instantiating the parser class object. However, normally we'd not ignore the return class and we would not be reading ourselves directly from the virtual file system later. Definitely not purely functional
methods here.
manifest.getMainAttributes().put(Attributes.Name.MAIN_CLASS, qualifiedClassName); | ||
|
||
try (JarOutputStream target = new JarOutputStream(output, manifest)) { | ||
for (Entry<String, JavaFileObject> entry : classes.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this saves all previously compiled classes to the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but the compiler is organized to be instantiated per run. So you get all the class files generated from compiling one Java source file loaded into one virtual file system that is wrapped by one classloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of parser generation, this entails one main file and as many sub-classes as there are non-terminals (including lists etc.) in the grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need to compile several collaborating source files, you can use 1 instance and still save the class files into one comprehensive jar file that will work if you use the load
facility here to get the files back.
Note that I have chosen to not use .class
file extensions and not change .
to /
to create a JVM folder structure. The reason is that this encoding is to remain opaque to our users, and maximally fast to deserialize. If we switch parsing technology, we may not even store our parsers as jars or class files.
var file = new JavaFileObjectImpl(className, JavaFileObject.Kind.CLASS); | ||
|
||
try (var fo = file.openOutputStream()) { | ||
fo.write(Prelude.consumeInputStream(jarIn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use a regular copy loop, to avoid loading it all into memory first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but so far this code has proven to be blazingly fast so I thought this is nice and short as it is. We average around 4 ms for storing or reading a medium sized grammar (so that includes reading and writing all byte arrays for the JVM bytecode of all sub-classes and the main parser class.
* after `storeParsers(g, file);` | ||
* then `g = loadParsers(file);` | ||
* and given `h = parsers(g);` | ||
* then for all valid `nt`, `input` and `origin`: `g(nt, input, origin) == g(nt, input, origin)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both side of the ==
is the exact same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% exact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah. h
} | ||
@pitfalls{ | ||
* reifiying types (use of `#`) will trigger the loading of a parser generator anyway. You have to use | ||
this notation for types to avoid that: `type(\start(sort("MySort")), ())` to avoid the computation for `#start[A]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tip.
I guess any concrete syntax will still trigger it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any type will trigger it. That's because the translation from concrete syntax types to abstract type symbols is implemented only once in lang::rascal::grammar::definition::Symbols
.
@@ -1149,6 +1149,16 @@ public static String consumeInputStream(Reader in) throws IOException { | |||
} | |||
return res.toString(); | |||
} | |||
|
|||
public static byte[] consumeInputStream(InputStream in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be some places in Prelude we can replace with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did that before; those were all Readers
I believe.
storeParsers(#start[A], |test-temp:///parsers.jar|); | ||
p = loadParsers(|test-temp:///parsers.jar|); | ||
|
||
return p(type(\start(sort("A")), ()), "a", |origin:///|) == parse(#start[A], "a", |origin:///|); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test :) can we also save a file somewhere? and load it back in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question. The test saves the file in |test-temp:///parsers.jar|
and then loads it back from that file, assigning the parser function into g
. Then it tests if that recovered parser function behaves the same as a freshly generated parser function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this code already has a #start[A]
in there. So what if we load a grammar that isn't reified in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question. I've coded that scenario in the @examples of the documentation, because there I can freely start a fresh REPL that has no declaration memory. I can not do that here unfortunately. Nevertheless, yes that scenario works 100%. It's a bit weird, because you do get parse trees for types that have never been declared. But, it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that it works is that the parsers
function and the storeParsers/loadParsers
function generate functions that take only the type as parameter and ignore the grammar part of the type values. The grammar has already been used by the store
function, and the start type is only used to decide which part of the grammar to activate.
Yes; but I'd like to see first if this does not solve your immediate problem already. I mean it could lead to immediate highlighting in the editor without that additional fix. Of course, it would be even faster if we also parse modules with concrete syntax like this, but perhaps this does not have to be done in such a rush. There are so many ways to do that; I'd like to think a bit about that before committing to a strategy that could lead to future problems. |
fixed typo thanks to @DavyLandman
same typo in documentation clone.
Well, only if I would change the interface of Rascal-LSP to allow passing in these pre-loaded parsers. Otherwise I still have to wait for the imports to finish before we get the parser handed to rascal-lsp. |
Right, so the main module that imports Well. yep. Then this independent feature is now a plateau we can build upon. @PaulKlint It will also help in writing the compiler, because now the compiler can call parser generation at compile time, store the parsers in the target folder, and those parsers can be loaded at run-time again very quickly. |
The original idea of that issue could still work for this? Register language gets an optional argument that you can pass in a serialized parser. Then in rascal-lsp we can preload the contribution with that, and then do the loading of the module in the background as we did before. Just a shorter time until the big wall of white/black text is gone. |
I have to think better about this. As it stands that solution does not work. You need more than the serialized parser to produce a parse tree; first of all also the top-nonterminal. If we'd pass a function to resolve that issue we'd run into the previous loading issue, so that solution would require yet another field in Another hesitation is that we've abstracted For that reason I'd rather have the modules load fast, and we can keep the interface clean. So I'd like to think about that for a little longer. |
Okay that makes sense, I would also appreciate an approach that doesn't require all kinds of bespoke solutions. |